Rebuild GPU software for all supported combinations of CPU and CUDA compute capabilities#969
Rebuild GPU software for all supported combinations of CPU and CUDA compute capabilities#969ocaisa wants to merge 4 commits intoEESSI:2023.06-software.eessi.iofrom
Conversation
|
Instance
|
|
Instance
|
|
Instance
|
|
Instance
|
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen2 accel:nvidia/cc80 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
|
New job on instance
|
|
New job on instance
|
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen2 accel:nvidia/cc80 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
|
New job on instance
|
|
New job on instance
|
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen2 accel:nvidia/cc80 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
|
New job on instance
|
|
New job on instance
|
| local function using_eessi_accel_stack (t) | ||
| if not os.getenv("EESSI_SKIP_ACCELERATOR_WARNING") then | ||
| local fullName = t.modFullName | ||
| local moduleFilePath = t.fn | ||
| -- Check if we are using an EESSI version 2023 accelerator stack by checking the moduleFilePath is | ||
| -- a path that starts with /cvmfs/software.eessi.io/versions and contains accel/nvidia/ccNN | ||
| if string.sub(moduleFilePath, 1, 33) == "/cvmfs/software.eessi.io/versions" then | ||
| if string.find(moduleFilePath, "accel/nvidia/cc%d%d") then | ||
| -- right now we print this for all cases, but eventually we should only | ||
| -- print this for accelerators we do _not_ test | ||
| local advice = fullName .. " has not been tested for " .. os.getenv("EESSI_SOFTWARE_SUBDIR") | ||
| advice = advice .. " with " .. string.match(moduleFilePath, "accel/nvidia/cc%d%d") | ||
| advice = advice .. " but is likely to work.\\n" | ||
| advice = advice .. "(Silence this message by setting the environment variable " | ||
| advice = advice .. "EESSI_SKIP_ACCELERATOR_WARNING)" | ||
| LmodMessage(advice) | ||
| end | ||
| end | ||
| end | ||
| end |
There was a problem hiding this comment.
Should this be moved to an EasyBuild hook instead (as what we can test on may change over time).
|
Not sure how this is possible, but EasyBuild is failing to apply the patch to the GROMACS sources: |
|
I don't get it, if I do it manually it works just fine |
|
@ocaisa Can you clarify in the PR description why these rebuilds are necessary? What has changed to require us rebuilding all of this? |
I think it's because the "start" dir is wrong somehow, it should be |
The |
|
GROMACS is an iterated installation, and applying the patch is failing in the 2nd iteration because the build directory is not fully removed when 2nd iteration starts (the But I don't see how this is only a problem now... Maybe we somehow break the cleanup through our hooks? |
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen2 accel:nvidia/cc80 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
|
New job on instance
|
|
New job on instance
|
|
@ocaisa Can you split up and retarget this pr? |
|
I think we will just come back to doing this naturally in the near future |
We're not going to able to test all possible CPU/GPU combinations, but we need a general approach to allow us to move forward with only testing a subset while providing more possibilities. This PR is intended to put this workflow in place and begin rebuilding all GPU packages to reflect the changes.
accelsubdir (with advice about what to do)NVCC_PREPEND_FLAGS='-arch=sm_XX'for the build (this probably has falllout so should be considered as a nice-to-have)